OCR error correction for Vietnamese handwritten text using neural machine translation
نویسندگان
چکیده
OCR post-processing is an important step for improving the quality of output texts. Long short-term memory (LSTM) a deep learning model, which has wide-range applications in many domains like time series prediction, natural language processing and speech recognition. In this paper, we propose error correction model using neural machine translation with bidirectional LSTM networks at syllable level. Vietnamese text dataset evaluation outputted from engine based on attention-based encoder-decoder (AED) taking input handwritten benchmark database ICFHR 2018 online recognition competition. The experimental results show that proposed helps decrease word rate texts above AED by about 2%. performance also discussed compared to other baseline methods
منابع مشابه
OCR Error Correction Using Statistical Machine Translation
In this paper, we explore the use of a statistical machine translation system for optical character recognition (OCR) error correction. We investigate the use of word and character-level models to support a translation from OCR system output to correct french text. Our experiments show that character and word based machine translation correction make significant improvements to the quality of t...
متن کاملGrammatical error correction using neural machine translation
This paper presents the first study using neural machine translation (NMT) for grammatical error correction (GEC). We propose a twostep approach to handle the rare word problem in NMT, which has been proved to be useful and effective for the GEC task. Our best NMTbased system trained on the CLC outperforms our SMT-based system when testing on the publicly available FCE test set. The same system...
متن کاملError correction translation using text corpora
In this paper, we propose an error correction method using text corpora. In this method, recognition errors are corrected using phonetically similar examples in the text corpora. The reliability of the correction hypotheses are judged according to their semantic consistency and their phonetic similarity to the original input. We previously proposed an error correction method that uses a treeban...
متن کاملAn Efficient OCR Error Correction Method for Japanese Text Recognition
OCR error correction using Japanese morphological analysis contains two time-consuming procedures: extraction of candidate words from combinations of candidate characters, and finding the most plausible word sequence in combinations of the candidate words. In this paper an optimal word extraction technique, and the use of lexical entries that are tailored for Japanese verb inflection, are inves...
متن کاملConfidence Measures for Error Correction in Interactive Transcription Handwritten Text
An effective approach to transcribe old text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the human supervisor, and the supervisor is assisted by the system to complete the transcription task as efficiently as possible. In this paper, we focus on a particular system prototype called GIDOC, which can be seen as a first attempt to provide user-f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nucleation and Atmospheric Aerosols
سال: 2021
ISSN: ['0094-243X', '1551-7616', '1935-0465']
DOI: https://doi.org/10.1063/5.0066679